
## 🚀 Quick Start

### Task 1: Data weighting in RLHF reward model training

```bash
python SO_Lazy_RM.py 
```

### Task 2: Data weighting in LLM alignment

```bash
python SO_Lazy_LLM.py 
```

---

## Experiment Configuration
The following information on how to configure the experiment applies to both tasks.

Five different bilevel algorithms can be tested, which are
1) SO-Lazy-BiO-I
2) SO-Lazy-BiO-II
3) AmIGO
4) SOBA
5) MA-SOBA


### Use the following setting to execute each algorithm

##### SO-Lazy-BiO-I:
* `alg_flag = 0`
* `m_flag = 1`
* `lazy_flag = 1`
* Specify values for `lazy_N` and `mu`


##### SO-Lazy-BiO-II:
* `alg_flag = 0`
* `m_flag = 1`
* `lazy_flag = 0`
* Specify values for `lazy_N` and `mu`

##### AmIGO:
* `alg_flag = 1`
* Specify values for `amigo_M` (number of inner steps updating 'z') and `amigo_N` (number of inner steps updating 'y')

##### SOBA:
* `alg_flag = 2`
* `m_flag = 0`

##### MA-SOBA:
* `alg_flag = 2`
* `m_flag = 1`
* Specify a value for `mu`


The following four variables take `alg_flag` as its index value.
1. `num_Ts`: total number of iterations
2. `alphas`: updating rate for the upper level parameter 'x'
3. `betas`: updating rate for the lower level parameter 'y'
4. `gammas`: updating rate for the auxiliary parameter 'z'
